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^ ■ Abstract 

In this paper, we propose a unified energy minimization model for the segmentation of non-smooth 
image structures. The energy of piecewise linear patch reconstruction is considered as an objective 
measure of the quality of the segmentation of non-smooth structures. The segmentation is achieved by 
minimizing the single energy without any separate process of feature extraction. We also prove that 
the error of segmentation is bounded by the proposed energy functional, meaning that minimizing the 
proposed energy leads to reducing the error of segmentation. As a by-product, our method produces 



a dictionary of optimized orthonormal descriptors for each segmented region. The unique feature of 
our method is that it achieves the simultaneous segmentation and description for non- smooth image 



structures under the same optimization framework. The experiments validate our theoretical claims and 

<N ■ 

show the clear superior performance of our methods over other related methods for segmentation of 
various image textures. We show that our model can be coupled with the piecewise smooth model to 
1 handle both smooth and non-smooth structures, and we demonstrate that the proposed model is capable 

\ of coping with multiple different regions through the one- against- all strategy. 
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Object segmentation, Mumford-Shah model, Active contour, Eigen-patch, piecewise linear patch 
reconstruction, error bound of segmentation. 
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I. Introduction 

Computer vision problems are often addressed by using mathematical models. The quality of 
the solutions to the problems are measured objectively in the mathematical models. With the 
valid mathematical models, we can elucidate the phenomenon of the natural computations, e.g. 
by human vision, that try to accomplish the tasks, and we can reproduce the result of the natural 
computations by computerized simulations. 

An objective measure of piecewise smooth image segmentation is the Mumford-Shah func- 
tional energy [lj. By minimizing the energy, we expect to achieve high quality of segmentation 
for piecewise smooth images. The problem of minimization of the Mumford-Shah functional 
energy is the mathematical abstraction, i.e. the model, of piecewise smooth image segmentation. 
The model is now known as the Mumford-Shah model. This methodology is different from that 
of those in 10 Q which target at the objective evaluation of the segmentation based on the 
ground-truth results from normal subjects. 

In Mumford-Shah model, each image region is modeled as a smooth or constant function with 
parameters. By minimizing the energy of the Mumford-Shah model, we can restore the smooth 
image in each region, and we can also obtain the partition boundaries of the segmentation 
located at the discontinuities in the restored image. However, the image values to be restored 
may not be piecewise smooth or flat as a whole. The images may contain regions of non-smooth 
structures. Such as the images in Fig. QJa). Imposing the smoothness on these images leads to 
the destructive averaging of the image content. This poses problem to the segmentation with the 
conventional Mumford-Shah model. For example, it is possible that non-smooth visual patterns 
different in structure may have similar average image values (See Fig. (Ua)). Consequently, the 
Mumford-Shah model cannot separate such patterns in the image space. 

To cope with non- smooth image data, the segmentation has been considered as a framework 
of two independent processes, i.e. the feature extraction and the segmentation of the feature 
image. There exist combinations of the Mumford-Shah model or other active contour model 
with predefined features H [HI O. The predefined features does not necessarily match the 
underlying non-smooth image structures. For example, in Fig. QJb) the feature values obtained 
by image filtering with the Gabor filters that well match the texture pattern, or the local averages 
of the feature values, can have significant spatial variations. Unsupervised feature selection has 
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(a) (b) 

Fig. 1: Illustration of the problem. The left two images are composed of different image structures that the 
conventional Mumford-Shah model would fail to segment; The diagram on the right shows that the Gabor feature 
image of a texture is nonsmooth. 



also been used [8]] (5|. The major problem concerns us is that these frameworks, formed 
by separate feature selection and segmentation, do not provide a unified optimization model for 
segmentation. These concerns motivate us to explore a unified and valid mathematical model for 
segmentation of non-smooth structures. 

In this paper, we propose a novel unified energy minimization model of the segmentation of 
general non-smooth image structures, motivated by the original Mumford-Shah model. We for- 
mulate the segmentation of general non- smooth image structures as a single energy minimization 
problem of piecewise linear patch reconstruction. The formulation is not heuristic since we prove 
that the error of segmentation is bounded by the energy of piecewise linear patch reconstruction 
for a fixed number of bases. Thus, minimizing the energy of piecewise linear patch reconstruction 
can reduce the error of segmentation. Regarding the energy minimization, we prove that the 
eigen-patches constitute the global optimal solution to the minimization of the error of linear 
patch reconstruction. We also explore a more efficient alternative to the eigen-decomposition 
of matrix for computing eigenvectors. We prove that the linear patch reconstruction based on 
gradient descent converges to the eigen-patches under some mild assumption on the initializa- 
tions. The assumption can often be met, and the convergence is linear. We therefore propose 
the gradient descent algorithm for solving the linear patch reconstruction problem even though 
the problem is not convex. The segmentation algorithm is based on alternating piecewise linear 
patch reconstruction and curve evolution. The piecewise linear patch reconstruction can naturally 
be coupled with the conventional Mumford-Shah model for coping with both smooth and non- 
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smooth structures. A unique feature of the method is that it produces both the segmentation and 
a dictionary of optimized (orthonormal) descriptors for each segmented region upon completion 
in the same optimization framework. 

The rest of the paper is organized as follows. We reviewed the related works in section [III We 
study the PC/PS model and propose the piecewise linear patch reconstruction model in sections 
Himrvl We show our error bound of the segmentation in terms of reconstruction error in section 
IIV-BI and we present our proved theoretical claim of the global optimality of the gradient descent 
for the nonconvex linear patch reconstruction problem in section HV-Ci The experimental results 
are presented in section [V] We conclude the paper in section [VH 



A. The Mumford-Shah model 

The segmentation of piecewise smooth images has been formulated as an energy minimization 
problem in the Mumford-Shah model. For an image defined over the image domain {[#, y] T 6 fi}, 
the two-phase Mumford-Shah model can be formulated as follows. 



where 5(-) is a Dirac delta function, H(-) is a Heaviside (step) function, v is a penalty coefficient. 
gi and g 2 are the reconstruction estimates, (f) is the signed distance function that partitions the 
image into two regions, i.e. (f) > for one region, (f) < for the other. E\ and £ 2 are the 
reconstruction errors of the two region models. H assigns either of the two models to all the 
pixels, and the last term tries to minimize the complexity of the labeling. For piecewise smooth 
(PS) model, we have £\(x,y) = (J — gi) 2 + A||V#i|| 2 , £ 2 [x,y) = (J — g 2 ) 2 + A||Vg 2 || 2 , where 
I(x,y) is the image value at [x,y] T , A is a constant penalty coefficient. Note that, /, gi and 
g 2 are all functions defined on [x, y] T . We will omit (x, y) behind these functions henceforth if 
there is no risk of confusion. The fundamental optimization technique for the reconstruction is 
the Green's functions solution to the Euler-lagrange equation obtained by Calculus of Variations, 



II. Background and related works 
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E((/),gl,g2) = / £ 1 {x,y,g 1 )H{(j))dxdy 




(1) 
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which was presented in ifTUll . The numerical schemes for solving the Euler-lagrange equation 
can also be found in [fTTTl [TT2T] . The smoothness regularization guarantees the global optimality 
of the solution obtained by any of the optimization methods for a given partition. 

If we assume the reconstruction functions gi, g 2 to be constants, i.e. for all (x,y) in 
gi(x,y) = c\ and g 2 (x,y) = c 2 , then the gradient terms in the PS model vanish, and the 
functional becomes the piecewise constant (PC) Mumford-Shah model, which is the prototype 
of the region competition lfT3ll and the Chan-Vese model |[T4l| . 

For minimizing the Mumford-Shah functional, the algorithm is often the alternating (or si- 
multaneous) implementation of segmentation and reconstruction. The reconstruction is achieved 
by image smoothing within each region of segmentation. The level set method is often used in 
|fT4l] for the segmentation. In recent years, the linear relaxation [15J and the convex relaxation 
lfT6ll of the Mumford-Shah model have been proposed, which lead to alternatives to the level set 
method. 

B. Region-based active contours for unsupervised texture segmentation 

The original the Mumford-Shah model was formulated for segmentation of piecewise smooth 
images. Later the model was extended for image texture segmentation. In ifTTll Q IS]], the Gabor 
filtering was incorporated in the Mumford-Shah model. In IfTTll . Lee et al. embedded the manually 
selected 24 Gabor filters into the PS Mumford-Shah model. In 0, Sandberg et al. adopted the 
Chan-Vese model as well as the maximum difference of feature means as the criterion for Gabor 
filter selection. Sagiv et al. [8]] adopted the framework in ^\ with manually selected filters. These 
frameworks require the textures to be piecewise smooth or flat in the feature space, but the filter 
selection for unsupervised segmentation cannot ensure this. Although the ad-hoc postprocessing 
of high dimensional anisotropic diffusion may be applied to ensure the piecewise smoothness, the 
reasons behind this remain obscure. Kokkinos et al. ^ proposed to apply the Region Competition 
model to a type of modulation features. This framework assumes that the textures are globally 
oscillating, as on zebras and tigers, and it still requires filter selection by dominant component 
analysis (DCA) for parametric texture modeling. However, it is actually unknown what kind 
of feature or the principle of feature selection can help the segmentation by Mumford-Shah 
model without supervision. Therefore, the existing combinations of the feature extraction and 
the Mumford-Shah model is heuristic and they do not provide a unified mathematical model for 
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the segmentation. 

There exist other frameworks of texture segmentation based on other similar active contour 
models with fixed texture feature, e.g. J6]] 01 and (5|. For example, in 01 and 0| the structure 
tensor has been chosen as the texture feature, and in ^ the histogram of image values on 
overlapping patches is chosen as the feature. These frameworks are more likely to form a unified 
theory of the segmentation. However, the choice of the structure tensor feature or local histogram 
has not been validated for general textures. 

III. The principle of segmentation behind the Mumford-Shah model 

In this section, we study the segmentation by Mumford-Shah model to understand the ra- 
tionale of this model for segmentation. We restricted ourselves to the two-phase model. The 
generalization of the two-phase framework to multi-phase may follow [fT8ll . which is out of the 
scope here. 

Let us consider the two-phase Mumford-Shah model in a simplified form as follows. 



where E\ and E 2 are the reconstruction errors corresponding to the subregions Vt\ and Vt 2 , such 
that Vl = Qi U^ 2 - Since we focus on the region model in this work, we omit discussing about the 
prior term of arclength for imposing smoothness. Note that, the smoothness term is important 
for dealing with noisy data. For more detailed discussions on the smoothness term, we refer the 
readers to lEDl lECl. 

In the PC model, E\{x,y) = \\I(x,y) — ci|| 2 , £ 2 {x,y) = \\I(x,y) — c 2 || 2 , where I(x,y) is 
the image value, ci, c 2 are the regional means of the image value. In PS model, the Si(x 1 y) = 
(/ — g\) 2 + A||Vgi|| 2 , E 2 {x,y) = (/ — g 2 ) 2 + A||Vg 2 || 2 , where gi,g 2 are the smooth restoration of 
the two image regions (due to the smoothness constraint). The smoothness regularization terms, 
A||Vgi|| 2 and A||V#2|| 2 are often omitted if gi and g 2 are solved by the normalized Gaussian 
convolution, such as in [19], which was interpreted as nonparametric regression [fT2l| . Regarding 
the minimization of this simplified functional ©, we have the following interesting fact upon 
fixing the error functions £i(x,y) and E 2 {x,y). 




(2) 
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Proposition 3.1: 



mm 

H 



E X U + £ 2 (1 - H)dxdy 




(3) 



min {(Si — £2)H} dxdy + C 



where H(x 1 y) = {0, 1} and C is a constant independent of H. 

This fact tells that optimizing the global assignment of H is equivalent to optimizing the 
assignment locally. The assignment rule for determining the optimal H at each pixel location is 
therefore the following. 



The proof is included in the appendix in a separate report. As in the interpretation in lfT3ll . 
Mumford-Shah model tries to achieve clustering based on the measure of the membership defined 
by E\, I = 1,2. Hence, the proper choice of the measure is essential to the segmentation by 
Mumford-Shah functional. 

IV. PlECEWISE LINEAR PATCH RECONSTRUCTION 

In what follows, we establish the mathematical model and the associated solution for the 
segmentation of non- smooth image structures. 

A. The model of piecewise linear patch reconstruction 

In the original Mumford-Shah model, each region is modeled as a piecewise smooth surface 
gi that minimizes the functional energy as follows. 



where I = 1, 2, A is a penalty coefficient. A major reason for this formulation is that the energy 
can be minimized by solving standard partial differential equations, i.e. the diffusion equation. 
Besides, the energy is convex. Hence, if the local optimal solution exists it is also global optimal. 
The problem is that this formulation imposes the smoothness. 




£i(x,y) < £20,2/) 



(4) 




(5) 
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To cope with non-smooth image structure, we consider the image patches as linear combination 
of patch bases, such as in many image appearance models, as follows. 

K K 



k=l 



k=l 



with the varying weights a k = (p,v&) n for different patches over the image, where {v k ,i = 
1,2,..., K} is a set of K orthonormal bases that fully reconstruct the patch. p(x, y) = \p uv {x^ y) \ [u, v] 
D xy ] where p uv (%, y) = I(x — u,y — v) and D xy is a square region centered at [x, y] T . 

It is obvious that the exact reconstruction can happen regardless of smoothness of the image 
patch. To find the bases, we may formulate the reconstruction as energy minimization as follows. 

K 



Mr 



argmm 

Ml 



P-J2(p,v l k ) n v l k 



k=l 



dxdy 



(7) 



Replacing the image reconstruction error in simplified two-phase Mumford-Shah functional ([3]) 
with the patch reconstruction error formulated above, we obtain the formulation of piecewise 
linear patch reconstruction as follows. 



where 



cU _ 
c l — 



min I £?H+ [ £ 2 D (1 - H)dxdy 
H < { v fc}-{ v ?}} 



(8) 



|D| 
1 

ini 



K 



P-^]<P, 

k=l 

/ I(x + u, y + v) 2 dudv 
Jd 



(9) 



( / I{x + u ) y + v)v l k {u, v)dudv ) 
k=i ^ / 



for I = 1, 2, where ||D|| is the size of the patch. From the above, we can note that the error can 
be computed by convolutions. 

From the assignment rule in ©, we know that the patches will be assigned to a region if the 
set of bases of this region produces the patch reconstruction error smaller than that of the other 
regions. 
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B. The error of segmentation by piecewise linear patch reconstruction 

In the above, we have shown that the piecewise linear patch reconstruction is capable of 
determining the assignment of the patches according to the patch reconstruction error, but we 
did not answer whether the assignment is correct. In the following, we establish the theoretical 
foundation of the proposed formulation. Specifically, we show that the error of the assignment, 
i.e. the segmentation error, is bounded by the patch reconstruction error for a fixed number of, 
say K, bases. Thus, by minimizing the error of reconstruction, we may reduce the error of 
segmentation. 

The assignment rule of © enables us to assess the probabilistic analysis of the correctness 
of the segmentation. We ask whether the segmentation by © is consistent with the truth. 
Specifically, we wish to know if p / = p(x',y') e {p|[x,?/] T e Cli} = Pi, whether £i(x' \y f ) < 
£ 2 (x',y')\ and if p' = p(x',y') e {p\[x,y] T e Q 2 } = P2, whether £ 2 {x' \y') < £i(x',y') 9 where 
Q = Qi U Q 2 is the true partition. {p|[x,y] T e Qi} means the set of patches that have their 
centers in f^, where I = 1, 2. This requires us to analyze the following segmentation error rate. 

8 seg = J^j ^2 MSiix' ,y')>S2{x< ,y>)} 

(10) 



[e 2 {x> ,y>)>Si{x> ,y>)] 

[x',y'] T ^ 2 



where \Vt\\, \Q, 2 \ are the sizes of the sets Q 1 and Q 2 . 

The segmentation error rate can be represented by probability, i.e. e seg = P Q \£\{x f ,y r ) > 
£ 2 (x > \y')] + P Q2 \£ 2 {x' ,y') > £\{x! \y')\ for sufficiently large population of Qi and Q 29 where 
P n is the probability of certain events due to Pi, P^ 2 is the probability of the events due to 
P 2 . We hope the error rate to be small. For our patch reconstruction model, we can deduce the 
following error bound for segmentation. 

Proposition 4.1: If the subregion Q l9 Q 2 are sufficiently large, such that E Pi [£°] = f Q £^Hdxdy = 
]k\ E E , 2 [^2 D ] = £ 2 D (1 - H)dxdy = j^E then the following holds. 

< \\mJa^P(x,y)H l dxdy 
£SeS "&l^l(||p|| D -^ 2 - 2(? -) ( } 

where |||| n is the patch norm, R l9 R 2j q l9 q 2 are constants, and < q\ < 00, < q 2 < 00. 
Besides, the denominator in the RHS of the bound is positive. 
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The proof is included in the appendix in a separate report. The above error bound guarantees 
in theory that the segmentation error rate due to the assignment rule © can be decreased by 
minimizing the reconstruction error with respect to a fixed number of bases. The number of bases 
will definitely smaller than the number of dimensions of the the patch, since the denominator 
in the RHS of the bound is positive. 

To conclude, we showed that the segmentation error rate is upper bounded by the total 
patch reconstruction error for a fixed K. The minimization of the patch reconstruction error 
can therefore minimize the segmentation error. 

C. Global optimal linear patch reconstruction 

In what follows, we address the optimal linear patch reconstruction. 

K}= argmin / £,(x,y,{^})H,dxdy 

s.t. Vi^j,(vj,v}) n = 0,Vfc, 

K 



(12) 



= 1 

□ 



where I = {1,2}, £i(x,y,{v l k }) = ||p - £(p, v[) n v^||^, (p,v£) D = JJ^P * v l k dudv and 



k=l 



m/2 ■ 



||f ||^ = f 2 (^ 5 v)dudv, where m is the width of a patch, and we consider square patch in 

this paper. 

A useful identity regarding this formulation is the following. 



^U l{x ,y,H )Wxd y^^ U({v'„}\ (13) 



where 



U(W k })= f E(P' W k )hHidxdy 

= / / \I(x + u,y + v)v l k (u,v)\ 2 dudyHidxdy 

according to Eq. ©, and / = {1,2}, {v[} are orthonormal. This identity can be verified by 
expanding the squared error. 

Before we try to solve the reconstruction problem (fT2l) (or (IT3l) equivalently), we also note that 
the optimization problem defined in Eq. ([72]) is a problem of minimizing constrained concave 
function. The formal statements with their proofs are included in the appendix in a separate 
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report. Such a problem is known to have local optimal solutions 112011 . However, we are able to 
show that the global optimal solution to the linear patch reconstruction problem is attainable. 
The key result regarding the optimality is the following theorem. 

Theorem 4.2: Given a set of functions {w k , k = 1, 2, K} defined as follows, 

N 

h=i (14) 
Vi,j, (w h Wj) D = 0,and Vi, ||wi|| n = 1 
where {e^, h = 1, 2, AT} are all the eigenvectors of K HU A m is defined by the following. 

h m (u,v,u',v') 

r (15) 

= / Ihi(x + u,y + v)Ihi(x + u' ,y + v')dxdy 

where I m = I - H h then the following bound is true. 

K 

U({w k })<J2^k = U({e k }) (16) 

k=i 

where U({w k }) is defined in Eq. (fT3l) . {A fc |l < k < K} are the first X eigenvalues of A^. 
The proof is included in the appendix in a separate report. The above suggests the matrix 
eigen-decomposition as our solution to the global optimal reconstruction. However, the eigen- 
decomposition typically requires converting the image to patches, computing the covariance 
matrix followed by Singular Value Decomposition (SVD). In our segmentation framework, 
we require the optimal reconstruction iteratively during the segmentation. Thus, the eigen- 
decomposition of the matrix obtained from the extracted and labeled patches can be time-storage 
consuming. Alternatively, we suggest the gradient descent as the solution. The gradient descent 
equation to minimize the error (fT2l) is the following. 

dv l n (u,v,t) 



dt 



J (^I H ,v l n ^ I H (x + u ) y + v)dxdy (17) 



The reasons for this choice are the following. 
Theorem 4.3: Given the following eigenvalue problem. 

[A m ]e k = X k e k (18) 

where X k is the eigenvalue, is the eigenvector and [A^]x = f^ u ^ K m ^dudv, and if there 
are finitely many, say N, eigenvectors of [A m ] , the following gradient descent procedure for 
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solving (fT2l) (or (IT3l) equivalently) converges to the global optimal solution of (fT2l) . with the 
initial bases v = J2k=i a k e k for any {a k \a k > 0, 1 < k < K}. 

while 1 < n < K do 



Step n: < 



Solve V n by ([T7D 

^< = <-E(<^ l C)^t (19) 

llv z II 1 
ll v nlln - 1 

end while 

where v l £ for k = 1, 2, n — 1 are the solved patch bases. 

The proof is included in the appendix in a separate report. Moreover, the convergence of the 
gradient descent is linear as claimed in the following corollary. 

Corollary 4.4: If there are TV < oo eigenvectors of A m , and the eigenvalues are ordered such 
that Ai > Ai > ... > Aat, the rate of convergence for the gradient descent iteration in each step 
of the greedy procedure defined by Eq. (1791) does not exceed 1. In other words, the procedure 
converges linearly. 

The proof is included in the appendix in a separate report. Besides, the gradient descent does not 
require computing the matrix A m or the matrix eigen-decomposition but only the convolutions. 

D. Coupling with piecewise smooth model 

It is easy to find two patches, which can be reconstructed equally well (yielding the same 
residue) by the same set of bases, while having significant difference in intensity. Hence, the 
proposed linear patch reconstruction model cannot be used for differentiating the image patches 
that are different only in their illumination and color but similar in their local structure. To 
cope with both the cases via the same model, we propose to combine our model of linear patch 
reconstruction with the original piecewise-smooth Mumford-Shah model to form an integrated 
functional model of segmentation as follows. 

mn J £*H + £*(!- H)dxdy + is J \\VH\\dxdy (2 0) 



m 

h 

\2 i (^ n \cD 



where £* = a(I — gi) + (1 — for I = 1,2 and < a < 1 is a predefined weight. The last 

term penalizes the complexity of the segmentation. 



13 



The numerical solution of H to the energy minimization model can be derived from the level 
set method, in which a signed distance function (f) is used to generate H through Heaviside 
function, i.e. H = H ((/>). 



By this equation, it is implied that the reconstruction errors £* and are fixed when 
implementing curve evolution. The energy in ([20b is minimized by alternatively updating the 
image partition by solving H and updating the reconstruction error in each region by solving 
gi and {v^}. This energy minimization procedure is in fact the gradient descent method for the 
separate variables. Hence, the convergence of the procedure is guaranteed. 



A. Data preparation and implementation details 

Natural textures are examples of the non-smooth structured visual patterns. Evidences show 
that the texture patches can be reconstructed (modeled or represented) well by patch subspaces, 
i.e. the eigenfilters, as reported in |t2TTl where the texture classification was investigated more 
carefully. Therefore, we mainly evaluate our method of segmentation by piecewise linear patch 
reconstruction on a subset of Brodatz textures [l22l| . We empirically choose the textures that 
appear relatively spatially regular with similar size of texture stimuli (textons) for evaluation. 
Therefore, we obtain a collection of textures: {D3, D5-6, D15-22, D24, D34-36, D49, D52-53, 
D55, D57, D65, D68, D76-77, D79, D81-85, D101-106}, which we call the set S. Afterwards, 
we generate two sets of mosaic texture images by paring all different the textures from S for 
evaluation our segmentation method. Each set contains 1260 images. One of the sets is made 
by the original textures, the other is by the textures with the mean intensity subtracted. The 
template for paring the textures is shown in Figure Eta). We also use this template as the ground 
truth for evaluating the segmentation. Both the sets are challenging for segmentation, especially 
the second one. 

Our segmentation algorithm for minimizing the functional energy (l20l) can be implemented by 
alternating an algorithm of image partitioning and the patch reconstruction via either the SVD 
or the gradient descent procedure in (IT9l) . We adopt the curve evolution governed by Eq. (I2TT) as 
the image partitioning algorithm in the implementation. All the methods in our experiments are 




(21) 



V. Experiments 
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Fig. 2: (a) is the template for texture mosaicing. (b) is the initial contour for curve evolutions. 

based on the curve evolution for fair comparison. For piecewise smooth image we may choose 
a small penalty coefficient for the contour length, e.g. 1 in our implementation, for non-smooth 
image we require a large penalty of the contour length for coping with the randomness, e.g. 
100 in our implementation. We use the maximum number of iterations to detect the convergence 
of the curve evolution algorithms. The maximum iteration number is set to be 600, since it is 
observed that the curve evolutions in the experiment converge before this iteration number is 
reached. The convergence of the gradient descent method for patch reconstruction is fast. We 
set the maximum iteration number to be 5. 

B. Evaluation of the gradient descent patch reconstruction 

In this subsection, we evaluate the proposed gradient descent method for computing the optimal 
bases. The principle is to compare the error of reconstruction by gradient descent with the error of 
reconstruction by SVD. We evaluate the gradient descent method for linear patch reconstruction 
on the Set S and Yale face database. We compare the reconstruction errors according to the first 
1-20 optimal bases produced by the gradient descent method with the errors according to the 
bases produced by SVD. The reconstruction errors can be computed by evaluating (fT2l) for the 
entire image domain. We also present the comparison of the reconstructions by orthogonalized 
Gabor filter and the SVD solution. We apply the principle of maximum filtering response to select 
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Fig. 3: The scatter plots of the normalized reconstruction errors for each face or texture image by the first 1-20 
bases. The value 1 corresponds to the worst reconstruction, and corresponds to the perfect reconstruction. The 
left two plots are the orthogonalized Gabor bases (vertical) vs. SVD (horizontal); The other two are the Gradient 
descent (vertical) vs. SVD (horizontal). Results by gradient descent is almost as the same as SVD for different 
numbers of bases. 

the first 20 Gabor filters from a filter bank of 40 filters. We then orthogonalize the selected filters 
to compute the energy. We compare the averaged errors but not the total error. The total error 
is the sum of the reconstruction errors for all the patches in the image. The averaged error is 
the total error divided by the number of patches. 

The results are shown in Figure [3l When visualizing the comparison of the errors, we divide 
the errors by the value of the maximum averaged errors to form a normalized error. The results 
by gradient descent are very close to that of the SVD, while the results of Gabor filters do not 
match the SVD. The initial bases for the gradient descent method are randomly generated. 

C. Evaluation of the segmentation by linear patch reconstruction 

We mainly evaluate the segmentation of our methods, i.e. the SVD and GD based methods, 
on the two subsets of mosaic textures introduced previously. We compare our methods with the 
PS Mumford-Shah model E2 (321, the Region-Scalable Fitting (RSF), a.k.a. the Local Binary 
Fitting (LBF) G3ll . the Gabor filtering based method [7] which is also adopted in J8) and the 
local histogram based Chan-Vese model JfJ, which we denote as HistPC henceforth. The Gabor 
filtering based method could be viewed as a baseline approach for texture segmentation, and the 
local histogram-based method is the state-of-the-art approach. We use 8 bases for each region in 
SVD and GD. We choose the 8 Gabor filters from a bank of 24 filters for each region according 
to the criterion of maximum filtering response. This filter selection criterion appears like model 
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TABLE I: Comparison of segmentation error summarized from Figure HI 





Our methods 


Others' 


Data 


SVD 


GD 


LBF 


PS HistPC 


Gabor 


Original 

Error(%) 

No mean 


12.56zbl0.10 
14.76zbl0.44 


11.66±11.15 
13.33±11.82 


42.78 zbl0.30 
45.64 zbl2.87 


40.00zbl0.52 9.96zbl5.85 
43.81zbl3.50 24.12zb21.56 


33.38±12.94 
42.68zb8.02 



fitting Il24ll . The patch size is 13 x 13. The methods are all based on curve evolution. We used a 
common initial curve for the curve evolutions. The initial curve in the image domain is shown 
in Figure Etb). Note that the initial contour crosses the true boundary of the two regions, and 
the converged contour is expected to outline the region of texture B on the right. 

We adopt the pixel-wise segmentation error rate to measure the quality of the segmentation 
by using the ground truth. The boxplot in Figure |4] shows the segmentation errors for all the 
methods. We can observe a clear lower error rate by our methods for differentiating the textures 
that differs only in their structures but not in their intensities. The results by GD and SVD are 
comparable, but the computational time for the segmentation based on SVD is 0.145 ± 0.048 
seconds per iteration for the two datasets, while the computational time for the segmentation 
based on GD is 0.137 ± 0.006 seconds. This suggests us to use the GD based bases updating 
scheme for segmentation. The mean error by the local histogram based Chan-Vese active contour 
for the original textures is small, but the variation of its performance is large. Besides, this local 
histogram based method is still ineffective for differentiating the different textures having the 
same mean intensity. We can also observe from the results that Gabor features can deal with 
textures when the textures differ in their intensities. However, when there is little difference in 
the intensities of textures, the Gabor features are still powerless. We summarize the quantities 
in Table QJ Besides, we visualize the curve evolutions, as well as the corresponding converged 
patch bases, for two textured images and a picture from the Berkeley dataset ll25ll in Figure \5\ 
The picture is composed of different structures, and the piecewise linear patch reconstruction 
is capable of differentiating the non-smooth structure from the piecewise smooth structure. We 
may observe that the descriptors are semantic. The results by other models are shown in Fig. [6l 
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Fig. 4: Comparison of segmentation error rates on the images with the mean intensity subtracted (left), and on the 
images of original textures (right). 




Fig. 5: Curve evolution on the Brodatz D 104-22 pair (top row) and D85-106 pair (mid row) and a real image taken 
from Berkeley dataset with the corresponding converged patch bases. The curves are drawn in red (better to view in 
color). The left/right two columns of the filters correspond to the background/foreground regions. The D22, D106 
and the lady are assumed to be the foregrounds. 




(a) PS (b) LBF (c) Gabor (d) HistPC 
Fig. 6: The converged curve evolution results of other methods with the same initialization 
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Fig. 7: The box plot of segmentation error against a 



D. Segmentation by the coupled model with parameter tuning 

To cope with both smooth and non-smooth contents, we may use the coupled model proposed 
previously. The quality of the result of the segmentation by the proposed coupled model depends 
on the value of the parameter a. It is obvious that the coupled model tends to be the conventional 
Mumford-Shah model if a = 1, and the model tends to be the pure piecewise linear patch 
reconstruction if a = 0. Fig. [8] shows the results of applying the pure piecewise linear patch 
reconstruction to piecewise smooth images (with noise). We can observe that the optimal bases 
of the regions are similar. Hence, the model can not deal with such images. We hope to find the 
a such that the coupled model can be used for segmenting piecewise smooth images while its 
ability for differentiating non-smooth structures is preserved. In our implementation, we run the 
segmentation with different a values on all the mosaic images composed of the original textures 
(without subtracting the mean off), obtained in the last subsection. We alter the value of a from 
0.1 to 0.9 to obtain the error of segmentation shown in Fig. [71 We observe the steady performance 
for a = and 0.1, and we observe the clear decay of the performance when changing a from 
0.1 to 0.9. This means that by selecting a = 0.1, the performance of the coupled model for 
differentiating non-smooth image structures is almost as good as that of the pure piecewise linear 
patch reconstruction. We also apply the model with a = 0.1 to the piecewise smooth images 
under noise and the good results are shown in Fig. [9l which indicates that the a = 0.1 is already 
sufficient for segmentation of piecewise smooth images. We therefore chose a = 0.1 in the 
coupled model to optimally balance the two terms in the coupled model. 

E. One-against-all segmentation 

We have so far only considered the presence of two different groups of contents in the image 
in our formulation, which is known as the two-phase model. Due to the capability of the level 
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Fig. 8: The segmentation of piecewise smooth images by pure linear patch reconstruction. The left column shows 
the initial contours. The middle column shows the converged curves. The right column shows the optimal bases. 
In the images of bases, the left two columns are the bases of background region, and the rest correspond to the 
foreground region (the regions enclosed by the contour curves). 




Fig. 9: The results of segmenting piecewise smooth images with noise by using the coupled model. The initial 
curves are same as those in Fig. [U 

set method for handling topology changes, the two-phase model is still capable of partitioning 
the image into multiple smooth or non-smooth regions of two groups. However, the two-phase 
model cannot cope with multiple regions of multiple groups. The problem of segmentation of 
image into multiple different regions can be addressed via the one-against-all strategy. In other 
words, we may consider a problem of n-phase segmentation as n subproblems of two-phase 
segmentation. In each subproblem, the image is to be partitioned into the target region and the 
background region which is composed of all the other regions. 

To evaluate the one-against-all strategy for coping with multiple different groups of regions, 
we apply this strategy to the mosaic images containing five different textures. The input and 
output of the segmentation are shown in Fig. [lOl The initial rectangular contours are shown 
in a unique color assigned to a region in one image. The converged contour curves are shown 
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(a) 



(b) 



Fig. 10: Multi-region segmentation via one-against-all strategy (better to view in color). The top row shows the 
two input-output pairs of mosaic images composed of multiple texture regions. The input is the initial rectangular 
contour laying over the image. In each region, a unique color is assigned to the contour. The output is the converged 
curve shown in the same color as in the input. The following rows show the optimal bases corresponding to each 
region in the sequence of top, bottom, left, right and center. 

TABLE II: Error rates (%) of the segmentation shown in Fig. [lOl 





Top 


Bottom 


Left 


Right 


Center 


Total 


Fig-CSa) 


5.16 


2.19 


1.94 


4.3 


3.2 


16.8 


Fig.db) 


2.86 


2.11 


3.94 


2.03 


1.92 


12.88 



in the same colors in the other image. The optimal bases corresponding to different regions 
are also visualized. We can observe that these bases well capture the principal structure of the 
corresponding regions. The error rates of the segmentation are summarized in Table III 
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VI. Discussions and conclusion 

A. Discussions on the limitations 

The size of the patches and the number of bases are predetermined, which is empirical. Both 
of the two properties affect the segmentation. For example, when increasing the patch size, 
the number of bases needed for small reconstruction error would increase due to the enlarged 
dimensionality of the patch. Therefore, to achieve a small error of segmentation the number of 
bases needed has to be large. Larger patch size and more bases would also provide a richer 
description of the non-smooth structure, which may help segmentation in complex situations. 
With a small patch size, which means the reconstruction error can be small with few bases, the 
discrimination might not be clear, since the same set of bases may give similar reconstruction 
errors to different group of patches in such case. 

This work did not really target at natural image segmentation. Specifically, this work explores 
the mathematical model for addressing an important aspect of the natural image segmentation, 
i.e. that of coping with non-smooth structures in the segmentation of images which are 2D 
signals. For natural image segmentation, more sophisticated framework has to be adopted to 
mimic human vision. 

B. Conclusion 

We propose a unified energy minimization model of non-smooth image segmentation without 
requiring any separate process of feature extraction. The model is the energy minimization of 
the error of piecewise linear patch reconstruction. The segmentation error rate of the proposed 
model is proven to be bounded by the patch reconstruction error. The gradient descent method 
for solving the linear patch reconstruction is proven to be globally optimal under mild conditions 
on the initialization. The experiments validate our theoretical claim and show the clear supreme 
performance of our methods over other relevant methods. The linear patch reconstruction in 
our approach can be viewed as an unsupervised dictionary learning process. Segmentation with 
features depends largely on the prior knowledge, whilst our approach is mostly data-driven. Our 
approach is useful when prior knowledge is inapplicable. 



Proof of Proposition \3.1[ 



Appendix 



Since Hi G {0,1} and Ya=i H i = h we have Ya=i^i( x ^v) h i = {^i{ x ^v)^2{x,y)}. Thus, 
for any position [x, y] T G Q we can choose £// = min{£i(x, y), E 2 [x, y)} 9 which is equivalent to 

r 2 ] 2 2 

= argmin <^ ^(x, } at [x, ?/] T . As a result, 5] y)H* < ^ for any 

{Ht} li=i ) 1=1 1=1 

Hi. Thus, J Q Yld=i £i( x i y)Hidxdy < J Q Ym=i £i{ x i y)Hidxdy, which completes our proof. ■ 
Note that the proof relies on the Axiom of Choice for the continuous domain. The choosing 
is feasible if we approximate the integral by discretization. 

Proof of Proposition \4.1\ To prove this proposition, we require the following lemma. 
Lemma 6.1: Given the same condition in Proposition 14. II then the following holds. 

J n £i(x, y)Hdxdy 

(A-l) 



Sseg ~ \^\£ 2 {x^y') 



f n £ 2 (x,y)(l - H)dxdy 
mSi{x',y') 

The proof of this lemma is due to Markov's inequality. This bound connects the error of 
segmentation to the error of reconstruction. 

Additionally, we require the following fundamental model of functions in the literature of 
signal analysis, such as wavelets ll26ll and especially compressive sensing Il27ll Il28ll . 

Universal energy bound Given a (discrete) function / G R N , and any subset of fixed system 
of sorted orthogonal bases {vi\i = 1,2, ...,K < N} such that ||(/, ^i)||/ 2 > ll(/^2)||/ 2 > ••• > 
\\(f vk)\\i 2 > then the following bound holds for every < n < N, 

\\(f,v n )\\ h <Rn-«, (A-2) 

where R = ||(/, ^i)||z 2 , and < q < oc. 



The worst case is when \\(f,vi)\\i 2 = \\(f,v 2 )\\i 2 = - = \\(f,v K )\\i 2 , i.e. q = 0. 

From Definition of universal energy bound, the linear patch reconstruction error at every pixel 
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could be bounded by using (IA-21) as follows. 



K 



Pl|£° = ||p-5]( p ' Vfc )n Vfc 

fc=i 

= llp|ln-X)<P' Vfc )n 

fc=i 

>iip^-f:^- 29i 



|2 

!□ 



(A-3) 



fe=i 

> || P ||* -i^tf 2 " 2 * 

where Z = 1,2. Substituting (IA-31) into the denominator of (IA-11) . and if the denominator is 
positive, we complete our proof. ■ 
Theorem 6.2: The optimization problem defined in Eq. (fT2l) is nonconvex. 



Proof of Theorem \6.2\ First, we shall prove that the objective functional is nonconvex. 
Taking functional derivatives of the energy twice, we have the following. 

D' 



(A-4) 



= — J I{x + vf, y + v')I(x + u, y + v)Hidxdy 

= — J I{x + v! ,y + v f )HiI(x + u,y + v)Hidxdy 
n 

= -fl m{x + u',y + v>)I m{x + u,y + v)d X dy 



= -A Hl (u',v',u,v) 

If this functional is convex, —A m (u f ,v f ,u,v) will be positive semi-definite. However, we can 
show that the — h m (u f , v\ u, v) is actually negative semi-definite as follows. 

— J h{u \ v)A H i(u\ v\ u, v)h(u, v)dudvdudv 



I{x + u ) y + v)h{u ) v)dudv 



Hidxdy 



(A-5) 



< 
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Hence, the objective functional is concave. Further, we consider the constraints in terms of 
orthonormality. Let {x^} and {y^} both satisfy the constraints. We wonder whether the convex 
combination also satisfy the constraints. For example, let wi = axi + (1 — a)yi, where a > 0. 
Then we can verify that ||wi|| ^ 1, unless Xi = y l5 which is generally not true. Therefore, the 
constraints are also nonconvex. Either of the above two facts can complete our proof. ■ 
Theorem 6.3: In the optimization problem defined in Eq. (fT2l) . if {v&, k = 1, 2, K} is the 
global optimal solution, then the transformation of {v k , k = 1, 2, X} via an orthogonal matrix 
TZ KxK is also the global optimal solution. 



Proof of Theorem \63x First, we define a linearly transformed patch basis {w/J as follows. 

wi(w, = [viO, v),V2(w, v), v^(ix, ^)] T ai 
w 2 (u,v) = [v[(u,v),v l 2 (u,v), ...,v l K (u,v)] T & 2 

(A-6) 

Wk(w^) = [vi(tx,v), Vj^v), v l K (u,v)] T a K 
where A = [ai, a 2 , a^] T is an K x X mixing matrix. We may simply write the following. 

wi = Va x , w 2 = Va 2 , . . . , w K = Va /C (A-7) 

where V = [v[(u, v),v l 2 (u, v), v l K (u, v)] T '. 

Hence, the energy in terms of w&(^, could be rewritten as follows. 

K 



^({ w *}) = J Y] (p, w k )^Hidxdy 

r K 2 
= / J^p,Va^ Hidxdy 



«S X 7 D (A-8) 
||AV T p|| 2 Hidxdy 



I 

Jn 



(p T V[A T A]V T p) Fidxdy 
If the mixing matrix A is orthogonal, we obtain the following. 



U ( {Wi} ) = J n ( pTVVTp ) Hldxdy = U {i v i}) ( A " 9 ) 

Therefore, if {v-} is the global optimal solution, the {w^} is also the global optimal solution, 
which completes the proof. ■ 
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To prove Theorem 14.31 we require some lemmas. 

Lemma 6.4: The integral operator [A m ] is symmetric positive semi-definite. 
The symmetry is straightforward. The proof of positive semi-definiteness is as in the proof of 



Theorem 16. 2[ 

Hence according to Mercer's theorem of eigen-decomposition of symmetric nonnegative def- 
inite bounded integral operator, we can write A m (u,v,u',v') in the form of infinite series of 
eigenfunctions as follows. 

Theorem 6.5 (Mercer's representation): Assuming || [Ahi(u, u', v')} \\ < oo, then 

oo 

A m (u, V, u\ v') = ^ X i e i( U > v ) e i( u '> v') (A- 10) 

1=1 

where A x > A 2 > . . . > are the eigenvalues, {e^u, v),i = 1, 2, ...} is the set of eigenfunctions, 
and the eigenfunctions form a system of orthogonal basis. 

A proof of the above may be found in E9l . Now we are in a position to prove the Theorem H31 
Proof of Theorem \4.3l : First, we write the gradient descent differential equation for the 
STEP 1 as follows. 

dv[(u, v, t) 

at (A-ll) 

= / A H i(u,v,u\v')v[(u ,v f ,t)dudv f 

where v[(u, v, 0) = v . The corresponding update equation for the n— th iteration is the following. 

Vi («, v) = v{ (u, v) 



r , (A-12) 

A t / A m v l { n (u',v')du'dv' 



Suppose v = X^i a i e i( u , u )> the update equation for the first iteration could be rewritten as 
follows. 

oo 

v l { l (u,v) = y]aiei(u,v) 
i=i 

/oo 
A m ^2 OLiGi{u, v)dudv 
i=i 



(A-13) 



i=l i=l 

oo 

= + ^\)aiGi{u,v) 



i=i 
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where we applied Mercer's representation. Hence, the update equation for the n+ 1— th iteration 
is the following. 



i=i 



:i + A t Ai; 



1 + A t \i 



+ A t Ai 



(A-14) 



» (1 + AiAi) n «iei(M,f) 
The normalization constraint gives us the desired result. 



lim v l { n (u, v) 



v l { n (u,v) 
\v[(u,v)\\ 



±e 1 (u,v) 



(A-15) 



where we may omit the ± sign. The proofs for the subsequent steps is hence straightforward, 
where we only need to replace v = a i e i{ u > v ) with v = YUlLj a i e i{ u > v )d = 2, 3, K. 
This setting is due to the Gram-Schmidt process in the constraints. The energy U({v\}) is 
therefore the sum of eigenvalues due to Mercer's representation. ■ 
Proof of Corollary \4A\ According to the proof of Theorem 14.31 the n— th iteration for 
STEP k is the following. 



v l £ n (u,v) = (1 + A t X k ) 



a k e k (u,v) 



N 



^ I l + A,A t Y 



(A-16) 



Let vj: n (-u, v) 



v l f(u,v)/(l + A t \ k T .&=||S&£ 



. Note that the eigenvalues are ordered. 
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Hence, (3 h < 1. From these we deduce the following. 

\\v l £ n+1 (u,v) - a k e k (u,v)\\ _ T,h=k+iPh +1 \ a h\ 



Wi n {u, v) - a k e k (u, v)\\ EjU+i @h\ a h\ 

N 



1 YLk + M<*»\(i - fo) 
Lfc+i^l"fcl 

ffi\a h \(l-(3 h ) 



N 



, (N - k - l)ff| Q . 



(A-17) 



h=fc+l 

1 * 



TV 

/i=/c+l 

where = max{^|a^|, h = k + l,i + 2, N}. The above completes the proof. ■ 

Proof of Theorem \42\ : The energy of Eq. (TT31) in terms of {w^, i = 1, 2, fc} is the 
following. 

^({wfe}) = / 5^ p ' Wk)n dxd y 

K 

= / Wfc(i&, v)w k (u l \ v')K H idudvdu' dv' (A-18) 

/c=l 

K A/" 
/c=l /l=l 

where w k = E^=i ct kh e h . Note that E^=i aj^ = 1. In words, the summation over In is a convex 
combination of A^. Note that the are ordered from large values to small values for h = 1 to 
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N. Hence, the following holds. 



N 



K 



N 



h=l 



Y a kh X h = Y a kh X h + Y a lh' X h> 
h=l h'=K+l 

K N / , 

= Y a kh X h + Y a kh' I Y @ h ) Xh ' 
h=l h'=K+l \h=l 

K 

'where, Y^ = 1? and, > 



(A-19) 



h=l 



K 



N 



K 



< Y a kh X h + Y a kh' Y ^ hXh 
h=l h'=K+l \h=l 

The last inequality is due to that the convex combination of bigger values is bigger than smaller 
values. Now we consider a special case of f3 h as follows. 



kh 



(A-20) 



This satisfies the following. 



K 



Efc=l (l ~~ E/i=l a khj 



(A-21) 



Besides, regarding the inequality in (IA-191) we have the following. 



K 



N 



a 



kh' 



k=i 



h'=K+l 



K 



= E 

k=l 
K 

= E 

k=l 

K 

= E 



\ h'=K J 



a kh + 



1 s^K 2 

1 - Efe=i °4 



Efc=i E/i=i a L) 



1 - ^ 

^=1 



(A-22) 



fc=i 



1 _ Y^ / c 2 

2 1 1 L/i'=l% 



Efc=i E/i=i " 



a 



fe/i 



fc=i 



= 1 



fc=i 



fe=i 
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Substituting the above into (1A-191) , we have the following. 

k k / N 



h=l k=l \ h'=K+l 
K 



J2\ h = U({e k }) 



h=l 

which completes our proof. ■ 
The global optimality of PCA for the reconstruction of zero-mean vectors has been reported 
in Il30ll . Our proof of global optimality of the eigenpatches for reconstruction of arbitrary patches 
is quite different from theirs, and our procedure of the proof is simpler. 
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